Improving Word Sense Induction by Exploiting Semantic Relevance
نویسندگان
چکیده
Word Sense Induction (WSI) is the task of automatically inducing the different senses of a target word from unannotated text. Traditional approaches based on the vector space model (VSM) represent each context of a target word as a vector of selected features (e.g. the words occurring in the context). These approaches assume that the words occurring in the context are independent and do not exploit semantic relevance between words. In this paper we propose a WSI method which can exploit semantic relevance between words by incorporating a word graph into the framework of clustering of context vectors. The method is evaluated on the testing data of the Chinese Word Sense Induction task of the first CIPSSIGHAN Joint Conference on Chinese Language Processing (CLP2010). Experimental results show that our method significantly outperforms the baseline methods.
منابع مشابه
Improving Retrieval Experience Exploiting Semantic Representation of Documents
The traditional strategy performed by Information Retrieval (IR) systems is ranked keyword search: for a given query, a list of documents, ordered by relevance, is returned. Relevance computation is primarily driven by a basic string-matching operation. To date, several attempts have been made to deviate from the traditional keyword search paradigm, often by introducing some techniques to captu...
متن کاملIs Hesaabdaaree an Adequate Equivalent for Accounting?
Some of the difficulties and misunderstandings that happen in accounting theory, practice, regulation and education are grounded in language and linguistics. As an illustration of this the Persian equivalent of 'accounting' is linguistically analysed to reveal how a mistranslation may cause difficulties in understanding and improving Iranian Accounting. This paper shows how Hesaabdaaree is not ...
متن کاملPattern abstraction and term similarity for Word Sense Disambiguation: IRST at Senseval-3
This paper summarizes IRST’s participation in Senseval-3. We participated both in the English allwords task and in some lexical sample tasks (English, Basque, Catalan, Italian, Spanish). We followed two perspectives. On one hand, for the allwords task, we tried to refine the Domain Driven Disambiguation that we presented at Senseval-2. The refinements consist of both exploiting a new technique ...
متن کاملDynamic and Static Prototype Vectors for Semantic Composition
Compositional Distributional Semantic methods model the distributional behavior of a compound word by exploiting the distributional behavior of its constituent words. In this setting, a constituent word is typically represented by a feature vector conflating all the senses of that word. However, not all the senses of a constituent word are relevant when composing the semantics of the compound. ...
متن کاملUNIBA-SENSE at CLEF 2008: SEmantic N-levels Search Engine
This paper presents evaluation experiments conducted at the University of Bari for the Ad-Hoc Robust WSD task of the Cross-Language Evaluation Forum (CLEF) 2008. The evaluation was performed using SENSE (SEmantic N-levels Search Engine) [2]. SENSE tries to overcome the limitations of the ranked keyword approach by introducing semantic levels, which integrate (and not simply replace) the lexical...
متن کامل